Pranab Banerjee, Boston Fusion Corp., pranab.banerjee@bostonfusion.com
Approximately how many hours were spent working on this submission in total?
Approximately 35 hours.
May
we post your
submission in the Visual Analytics Benchmark Repository
after VAST Challenge
2017 is complete?
YES
Video
Questions
MC2.1 – Characterize the sensors’ performance and operation. Are they all working properly at all times? Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 9 images and 1000 words.
The provided data shows that no reading from any of the sensors
were registered between 11PM on 4/30/2016 and 0:00 hour on
8/1/2016. This could be an indicator of sensor network
malfunction during that period.
This period of disruption in sensor reading was discovered by
looking at the "normal" statistical pattern of time interval
between successive readings for all the sensors. To compute the
intervals, the time stamps in the provided data file "Sensor
Data.xlsx" were converted from the specified date and time
format to unix epoch format (number of seconds since 00:00:00
UTC 1st January 1970). This made it easier to compute the sensor
reading intervals in seconds. The histogram of time intervals
for all the sensors were essentially identical and showed
the pattern shown in Figure 1 below:
Figure 1: Histogram of time intervals between successive
sensor measurements for sensor 1. Bin frequency axis has been
truncated to a upper limit of 20.
Note that the y-axis in
Figure 1 has been truncated to a bin-frequency count of 20 to
emphasize the fact that there is a single outlier to the
far right of the histogram which is a bin with a single
interval value in it. The non-truncated histogram had y-axis
range of 0 - 2199 corresponding to 2199 interval values in the
first bin. The median of the intervals in this bin was 3600
seconds, which indicates that most of time, the sensor
readings were taken every 3600 seconds. The outlier time
interval to the extreme right of the histogram above was found
to be 7952000 seconds. A search for this interval in the
original dataset immediately led to the starting point of this
interval to be at 11PM on 4/30/201.
Next we looked at the relations between sensor reading and
each of wind direction and wind speed. The goal is to
determine the sources of the different chemicals by
correlating wind direction corresponding to high sensor
readings with directions between [sensor, factory] pairs. To
this effect, first we computed the following list of
directions between the sensors and factories using the same
angle measurement scheme as the wind directions specified in
the provided data (North being 360/0 degrees):
Table 1: List of directional angles for pairwise
conbinations of factory to sensor
sensor factory angle 1 Roadrunner 77.4711922908485 1 Kasios 90 1 Radiance 87.9098408462893 1 Indigo 89.0122396003602 2 Roadrunner 109.179008025811 2 Kasios 120.256437163529 2 Radiance 93.8712562319856 2 Indigo 103.535856369134 3 Roadrunner 137.121096396661 3 Kasios 145.007979801441 3 Radiance 96.9529574681739 3 Indigo 113.355564859286 4 Roadrunner 176.820169880136 4 Kasios 175.236358309274 4 Radiance 99.7132510249403 4 Indigo 125.706691400603 5 Roadrunner 221.18592516571 5 Kasios 210.579226872489 5 Radiance 100.04202363553 5 Indigo 141.009005957495 6 Roadrunner 248.962488974578 6 Kasios 265.236358309274 6 Radiance 87.6386253418244 6 Indigo 90 7 Roadrunner 0 7 Kasios 3.17983011986423 7 Radiance 78.1901170429717 7 Indigo 58.4957332807958 8 Roadrunner 36.869897645844 8 Kasios 48.8140748342903 8 Radiance 81.3571974199955 8 Indigo 71.9395280638008 9 Roadrunner 243.434948822922 9 Kasios 234.090276920822 9 Radiance 101.30993247402 9 Indigo 177.137594773888 |
Apart from the interval histograms shown in Figure 1 above, anomalies
in the sensor readings were also observed in the sensors for a
set of the chemicals. Two of the visualizations that clearly
showed these anomalies are (i) sensor reading vs. wind
direction plot, and (ii) sensor reading vs. wind speed plot.
These plots were created for each [sensor, chemical]
combination. For lack of space, we only show two such sets of
plots in Figure 2 and 3 below for sensors 2 and 6. The
four plots on the left column in these figures show the
relation between sensor reading and wind direction for all
four chemicals, and the plots on the right column show the
relation between sensor reading and wind speed.
Figure 2 shows plots for sensor #2. This figure clearly shows
the anomalies in the sensor readings. For example, the two
plots in the top row show three anomalous reading
corresponding to the three isolated spikes corresponding to
wind directions of 109.13, 130.20 and 227.34 degrees, and wind
speeds of 0.76, 1.10 and 2.56 m/s respectively. The timestamps
for these spikes are "2016-08-20 05:00:00 UTC",
"2016-04-17 03:00:00 UTC", and "2016-08-02 04:00:00 UTC"
respectively.
113.80, 155.20,
227.68, 228.02, and 228.05degrees, and corresponding time
stamps of 2016-08-20 06:00:00 UTC, 2016-12-05
06:00:00 UTC,
2016-08-01 19:00:00 UTC, 2016-08-01 10:00:00 UTC, and
2016-08-01 09:00:00 UTC respectively. Note that the plots in
Figure 2 appear to show only three spikes instead of five.
This is because the x-axis is compressed here for space
limitation, and some of the neighboring spikes overlap.
Figure 2: Plots of sensor reading vs. wind direction
and wind speed for sensor #2
It is worth noting that a set of spikes with a Gaussian
fall-off around the highest peak most likely indicate valid
readings since we expect gradually decreasing amounts of
chemicals reaching a sensor as the wind direction deviates
from the direction of the source from the sensor. But a single
isolated spike would most likely point to a sensor anomaly. As
an example, consider the top three plots in the left column of
Figure 3 below, which shows readings for the sensor #6. Here
the peaks have approximately Gaussian fall off indicating
these peaks most likely correspond to valid readings. However,
the single isolated high spike in reading for the chemical
Appluimonia for the wind direction of 227.18 degrees
corresponding to a time stamp of 2016-08-02 08:00:00 UTC is
suspect and could potentially be linked to sensor anomaly.
Figure 3: Plots
of sensor reading vs. wind direction and wind speed for sensor
#6
A
Dirichlet process based machine learning algorithm was developed
to automatically determine such isolated spikes. Based on a
combination of machine learning algorithm and visual analytics
using plots like those in Figures 2 and 3, the following list of
unexpected sensor readings were discovered:
Sensor |
Interpolated Angle (deg) |
Interpolated Speed (m/s) |
Date Time |
Chemical |
1 |
88.90 |
0.6333333 |
2016-12-07 01:00:00 UTC |
Methylosmolene |
1 |
155.20 |
0.8 |
2016-12-05 06:00:00 UTC |
AGOC-3A |
1 |
227.19 |
2.56 |
2016-08-02 08:00:00 UTC | Appluimonia |
1 |
104.47 |
0.53 |
2016-08-20 04:00:00 UTC |
Appluimonia |
1 |
109.13 |
0.76 |
2016-08-20 05:00:00 UTC |
Appluimonia |
2 |
227.34 |
2.56 |
2016-08-02 04:00:00 UTC |
Methylosmolene |
2 |
227.26 |
2.56 |
2016-08-02 06:00:00 UTC |
Chlorodinine |
2 |
228.06 |
2.55 |
2016-08-01 09:00:00 UTC |
AGOC-3A |
2 |
228.02 |
2.55 |
2016-08-01 10:00:00 UTC |
AGOC-3A |
2 |
227.68 |
2.55 |
2016-08-01 19:00:00 UTC |
AGOC-3A |
2 |
155.20 |
0.80 |
2016-12-05 06:00:00 UTC |
AGOC-3A |
2 |
303.60 |
1.20 |
2016-04-04 13:00:00 UTC |
Appluimonia |
2 |
125.57 |
0.67 |
2016-04-14 07:00:00 UTC |
Appluimonia |
2 |
227.22 |
2.56 |
2016-08-02 07:00:00 UTC |
Appluimonia |
2 |
335.83 |
1.13 |
2016-12-06 19:00:00 UTC |
Appluimonia |
3 |
227.53 |
2.56 |
2016-08-01 23:00:00 UTC |
Methylosmolene |
3 |
Almost
all readings for wind direction about 145 degrees |
Chlorodinine |
||
3 |
227.86 |
2.55 |
2016-08-01 14:00:00 UTC |
AGOC-3A |
3 |
227.60 |
2.55 |
2016-08-01 14:00:00 UTC |
AGOC-3A |
3 |
This
sensor is unreliable and noisy for detecting this chemical |
Appluimonia | ||
4 |
This sensor is unreliable and noisy for detecting this chemical | Appluimonia | ||
5 |
158.60 | 0.60 | 2016-08-12 06:00:00 UTC | AGOC-3A |
5 |
158.60 |
0.60 |
2016-08-12 06:00:00 UTC |
Chlorodinine |
5 |
155.20 |
0.80 |
2016-12-05 06:00:00 UTC |
Appluimonia |
6 |
227.18 |
2.5 |
2016-08-02 08:00:00 UTC |
Appluimonia |
7 |
124.83 |
0.43 |
2016-04-19 01:00:00 UTC |
Methylosmolene |
7 |
239.87 |
0.47 |
2016-04-19 02:00:00 UTC |
Methylosmolene |
7 |
357.17 |
0.77 |
2016-04-19 05:00:00 UTC |
Methylosmolene |
7 |
354.80 |
0.80 |
2016-04-14 15:00:00 UTC |
AGOC-3A |
7 |
358.30 |
0.90 |
2016-04-19 06:00:00 UTC |
AGOC-3A |
8 |
254.26 |
0.57 |
2016-04-29 05:00:00 UTC |
Chlorodinine |
8 |
59.00 |
0.80 |
2016-04-16 12:00:00 UTC |
Appluimonia |
9 |
158.9 |
0.6 |
2016-04-11 03:00:00 UTC |
Methylosmolene |
9 |
This sensor is unreliable and noisy for detecting this chemical | Chlorodinine | ||
9 |
273.97 |
0.97 |
2016-12-15 10:00:00 UTC |
AGOC-3A |
MC2.2 –
Now turn
your attention to the chemicals themselves.
Which chemicals are being detected by the sensor group? What patterns of
chemical releases do you
see, as being reported in the data?
Limit your response to no more
than 6 images and
500 words.
The sensor reading vs. wind direction plots mentioned above show
the detected chemicals as higher than baseline sensor readings.
Here are 6 such plots (in addition to the two shown in Figures 2
and 3 above.
Figure 4: Plots of sensor reading vs. wind direction and wind speed for sensor #1
Figure 5: Plots of sensor reading vs. wind direction and wind speed for sensor #3
Figure
6: Plots of sensor reading vs. wind direction and
wind speed for sensor #4
Figure 7: Plots of sensor reading vs. wind direction and wind speed for sensor #5
Figure
8: Plots of sensor reading vs. wind direction and wind
speed for sensor #7
Figure 9: Plots of sensor reading vs. wind direction and wind speed for sensor #8
Chemicals detection by sensor
A Dirichlet
process based unsupervised clustering algorithm was
developed to automatically detect clusters of such significant
sensor readings. Combination of the output of this clustering
algorithm and visual analysis of the sensor reading vs. wind
direction plots (Figure 4 through 9 above) show the following
information about chemicals being detected by the deployed
sensors:
Sensor 1 is detecting: Chlorodinine, AGOC-3A,
Appluimonia
Sensor 2 is detecting: Methylosmolene, Chlorodinine,
AGOC-3A
Sensor 3 is detecting: Methylosmolene, AGOC-3A
Sensor 4 is detecting: Methylosmolene, Chlorodinine,
AGOC-3A
Sensor 5 is detecting: Methylosmolene, Chlorodinine,
AGOC-3A, and Appluimonia
Sensor 6 is detecting: Methylosmolene, Chlorodinine,
AGOC-3A, and Appluimonia
Sensor 7 is detecting: Methylosmolene, Chlorodinine,
AGOC-3A
Sensor 8 is detecting: Methylosmolene, Chlorodinine,
AGOC-3A
Sensor 9 is detecting: Methylosmolene, AGOC-3A, and
Appluimonia
MC2.3 – Which factories are responsible
for which
chemical releases? Carefully describe how you determined this
using all the
data you have available. For the factories you identified,
describe any
observed patterns of operation revealed in the data.
Limit your response to no more
than 8 images and
1000 words.
The
chemicals released by the factories are obtained by correlating
the directional angles of significant sensor readings for each
sensor with the factory to sensor angles as listed in Table 1.
An unsupervised clustering algorithm was developed to detect
significant sensor readings above the baseline for each
sensor-chemical pair. Singleton clusters are eliminated as
outliers. The largest cluster (corresponding to baseline
readings) was also rejected. The direction to sensor for a
cluster was computed as the direction for a sensor measurement
that is closest to the center of the cluster. This factory that
is most closely aligned along this direction from the sensor is
then assigned the chemical corresponding to the cluster. The
results obtained by this algorithm are shown in Table 3.
Table 3: List of chemicals from factories and the
sensors that detect them
Factory |
Chemical |
Sensor Used for determination |
Indigo | Methylosmolene | Sensor 8 |
Indigo |
Chlorodinine |
Sensor 2 |
Indigo | AGOC-3A | Sensor 9 |
Indigo | Appluimonia | Sensor 5, Sensor 6, Sensor 9 |
Kasios | Methylosmolene | Sensor
3, Sensor 4, Sensor 5, Sensor 6, Sensor 7, Sensor 8,
Sensor 9 |
Kasios |
Chlorodinine |
Sensor 1, Sensor 4, Sensor 5, Sensor 6,
Sensor 7, Sensor 8 |
Kasios |
AGOC-3A |
Sensor 1, Sensor 3, Sensor 4, Sensor 5,
Sensor 6, Sensor 7, Sensor 8, Sensor 9 |
Radiance | Methylosmolene | Sensor 7 |
Radiance | Chlorodinine | Sensor 7 |
Radiance |
AGOC-3A | Sensor 6 |
Roadrunner | Methylosmolene | Sensor 2, Sensor 3, Sensor 4, Sensor 5 |
Roadrunner |
Chlorodinine |
Sensor 1, Sensor 4, Sensor 5, Sensor 8 |
Roadrunner | Appluimonia | Sensor 1, Sensor 5 |
Roadrunner | AGOC-3A |
Sensor 2, Sensor 3, Sensor 4, Sensor 5,
Sensor 6, Sensor 8, Sensor 9 |
Since the emissions of
It is highly likely that the factory Kasios
emits Methylosmolene, Chlorodinine, and AGOC-3A since these are
being detected by a large number of sensors .
The factory Radiance does not emit Appluimonia. Since only
sensor #7 detects Methylosmolene and Chlorodinine, and sensor #6
which is much closer to Radiance does not detect these, these
detections by sensor #7 are most like erroneous. Since AGOC-3A
from Radiance is only detected by the closest sensor #6 and not
by any other sensor, the level of emission of AGOC-3A from this
factory is low.
The factory Roadrunner seems to be emitting all four chemicals
since each of these are being detected by multiple sensors.
Factory patterns of operation
To characterize the patterns of
operation of the factories, the sensor readings for the
factories bt the sensors that detect them (as shown in
Table 3) were computed over time of day irrespective of
the date. Figure 10 through 17 show such plots for the
different factory-chemical combinations: